10. Different types of Spark Functions
Transformations and Actions
There are two types of functions in Spark:
- Transformations
- Actions
Spark uses lazy evaluation to evaluate RDD and dataframe. Lazy evaluation means the code is not executed until it is needed. The _ action_ functions trigger the lazily evaluated functions.
For example,
df = spark.read.load("some csv file")
df1 = df.select("some column").filter("some condition")
df1.write("to path")
- In this code,
selectandfilterare transformation functions, andwriteis an action function. - If you execute this code line by line, the second line will be loaded, but you will not see the function being executed in your Spark UI.
- When you actually execute using action
write, then you will see your Spark program being executed:select-->filter-->writechained in Spark UI- but you will only see
Writeshow up under your tasks.
This is significant because you can chain your RDD or dataframe as much as you want, but it might not do anything until you actually trigger with some action words. And if you have lengthy transformations, then it might take your executors quite some time to complete all the tasks.